Self-Exciting Point Process Modeling of Crime 
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Highly clustered event sequences are observed in certain types of crime data, such as burglary and gang violence, due to crime-specific 
patterns of criminal behavior. Similar clustering patterns are observed by seismologists, as earthquakes are well known to increase the risk 
of subsequent earthquakes, or aftershocks, near the location of an initial event. Space-time clustering is modeled in seismology by self¬ 
exciting point processes and the focus of this article is to show that these methods are well suited for criminological applications. We first 
review self-exciting point processes in the context of seismology. Next, using residential burglary data provided by the Los Angeles Police 
Department, we illustrate the implementation of self-exciting point process models in the context of urban crime. For this purpose we use a 
fully nonparametric estimation methodology to gain insight into the form of the space-time triggering function and temporal trends in the 
background rate of burglary. 

KEY WORDS: Crime hotspot; Epidemic Type Aftershock Sequences (ETAS); Point process. 


1. INTRODUCTION 

Criminological research has shown that crime can spread 
through local environments via a contagion-like process (John¬ 
son 2008). For example, burglars will repeatedly attack clusters 
of nearby targets because local vulnerabilities are well known to 
the offenders (Bernasco and Nieuwbeerta 2005). A gang shoot¬ 
ing may incite waves of retaliatory violence in the local set 
space (territory) of the rival gang (Tita and Ridgeway 2007; 
Cohen and Tita 1999). The local, contagious spread of crime 
leads to the formation of crime clusters in space and time. 

Similarly, the occurrence of an earthquake is well known to 
increase the likelihood of another earthquake nearby in space 
and time. For example, we plot in Figure 1 a histogram of the 
times between “nearby earthquakes,” pairs of earthquake events 
separated in space by 110 kilometers or less, for all recorded 
earthquakes of magnitude 3.0 or greater in Southern California 
during 2004-2005. The histogram shows a spike at short times, 
indicating an increased likelihood of another event in the days 
following each earthquake. For a stationary Poisson process the 
distribution of times between pairs of events would be approx¬ 
imately uniform when the length of the time window is much 
larger than the longest time bin of the histogram. 

In the case of residential burglary, evidence indicates that 
an elevated risk exists for both a house that has been recently 
burgled and its neighboring houses (Farrell and Pease 2001; 
Johnson et al. 2007; Short et al. 2009). To illustrate this point 
further, we plot in Figure 1 a histogram of the times between 
“nearby burglaries,” residential burglaries separated in space by 
200 meters or less, for all recorded residential burglaries within 
an 18 km by 18 km region of the San Fernando Valley in Los 
Angeles during 2004-2005. Again we observe a spike at short 
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times, indicating an increased likelihood of victimization within 
a few hundred meters and several days of each burglary. 

Self-excitation is also found in gang violence data, as an 
event involving rival gangs can lead to retaliatory acts of vi¬ 
olence. In Figure 2, we plot the times of all recorded violent 
crimes between the gang known as “Locke Street” and the ri¬ 
val gang known as “Lowell Street” occurring between 2000 and 
2002 in the Los Angeles police district of Hollenbeck. Here we 
observe clear clustering patterns suggestive of self-excitation in 
the rate at which the two rival gangs attack each other. 

We propose that self-exciting point processes can be adapted 
for the purpose of crime modeling and are well suited to cap¬ 
ture the spatial-temporal clustering patterns observed in crime 
data. More specifically, spatial heterogeneity in crime rates can 
be treated using background intensity estimation and the self¬ 
exciting effects detected in crime data can be modeled with a 
variety of kernels developed for seismological applications or 
using nonparametric methods. In Section 2, we review self¬ 
exciting point processes in the context of seismological mod¬ 
eling. In Section 3, we present a model for residential bur¬ 
glary based upon nonparametric methods for Epidemic Type 
Aftershock-Sequences models of earthquakes. Our methodol¬ 
ogy combines the idea of stochastic declustering with Kernel 
Density Estimation in a novel way. In Section 5, we compare 
the predictive accuracy of our methodology with prospective 
crime hotspot maps. The results illustrate how crime hotspot 
maps can be improved using the self-exciting point process 
framework. We validate the methodology with a simulated 
point process in the Appendix. 

2. SELF-EXCITING POINT PROCESS MODELS 
IN SEISMOLOGY 

A space-time point process N(t, x, y) is typically character¬ 
ized via its conditional intensity X(t,x,y), which may be de¬ 
fined as the limiting expected rate of the accumulation of points 
around a particular spatial-temporal location, given the history 
Ht of all points up to time t (Daley and Vere-Jones 2003): 

k(t, x, y) = lim (E\N{(t,t + At) x (x,x+Ax) 

At,Ax,Ay±0 K L 

x (y,y+Ay)}\H t ]/(AtAxAy)). (1) 
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Figure 1. On the left, histogram of times (less than 300 days) between Southern California earthquake events of magnitude 3.0 or greater 
separated by 110 kilometers or less. On the right, histogram of times (less than 50 days) between burglary events separated by 200 meters or 
less. 


In seismology a mark Mk, the magnitude of the earthquake, is 
associated with each event ( 4 , Xk, yk) and the conditional inten¬ 
sity often takes the form 

X(t,x,y,M)=j(M)X(t,x,y), (2) 

k(t,x,y) = ii(x,y ) 

+ ^2 g(t-t k ,x-xk,y-yk‘,M k ). (3) 

{k:t k <t} 

Models of this type, referred to as Epidemic Type Aftershock- 
Sequences (ETAS) models, work by dividing earthquakes into 
two categories, background events and aftershock events. Back¬ 
ground events occur independently according to a stationary 
Poisson process /x(x,y), with magnitudes distributed indepen¬ 
dently of 11 according to j(M ). Each of these earthquakes then 
elevates the risk of aftershocks and the elevated risk spreads in 
space and time according to the kernel g(t, x, y, M). 



Figure 2. Times of violent crimes between two rivalry gangs in Los 
Angeles. 


Many forms for g have been proposed in the literature, 
though in general the kernel is chosen such that the elevated risk 
increases with earthquake magnitude and decreases in space 
and time away from each event. For example, the isotropic ker¬ 
nel, 


git, x, y; M) 


Kp 

{t + c)P 


e a(M-M 0 ) 

(x 2 +y +d)9' 


(4) 


is one of a variety of kernels reviewed in Ogata (1998). Here 
Kq, Mo, and a are parameters that control the number of after¬ 
shocks, c and d are parameters that control the behavior of the 
kernel at the origin, and p and q are parameters that give the 
(power law) rate of decay of g. 

Standard models for the background intensity jx(x, y) include 
spline, kernel smoothing, and Voronoi estimation (Silverman 
1986; Ogata and Katsura 1988; Okabe et al. 2000). In the case 
of fixed bandwidth kernel smoothing, the background intensity 
is estimated by 


fi(x, y) = fi ■ ^2 u (x — x k ,y — yk ; cr), (5) 

k 

where /Z is a parameter controlling the overall background rate. 
The events ( 4 , %k,yk, MO are assumed to be background events 
and in practice can be obtained through a declustering algo¬ 
rithm (Zhuang, Ogata, and Vere-Jones 2002). 

The appropriate selection of parameter values is as critical 
to the modeling process as specifying accurate forms for /a , 
g, and j. The distance in space and time over which the risk 
spreads, the percentage of background events vs. aftershocks, 
the dependence of the increased risk on magnitude size, etc., 
all can have a great impact on the predictive power of a point 
process model. Parameter selection for ETAS models is most 
commonly accomplished through maximum likelihood estima¬ 
tion, where the log-likelihood function (Daley and Vere-Jones 
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2003), 

K0) = X lo gM4, x k , y k ; 0)} 
k 


ni 


X(t,x,y; 0) dydxdt, (6) 


is maximized over all parameter sets 6. Here S x [0, 7"] is the 
space-time window of observation. 

More recently, nonparametric methods have been intro¬ 
duced for self-exciting point process estimation (Zhuang 2006; 
Marsan and Lenglin 2008). Consider space-time point data 
{(t k , x k , y k )} k =i ar *d a general self-exciting point process model 
of the form 


k(t,x,y) = fi(t,x,y)+ X 8(*-t k >x-x k ,y-y k ). (7) 

{k:tk<t} 


Assuming model correctness, the probability that event i is a 
background event, pa, is given by 

tuxuyi) 

Pii = x— -r (8) 

X(ti,Xi,yi) 

and the probability that event j triggered event i, pp, is given by 


Pji = 


g{tj - tj, Xj - Xj, yt - yj) 

HU, Xi, y t ) 


(9) 


(Zhuang, Ogata, and Vere-Jones 2002). Let P denote the matrix 
with entries pp (note that the columns sum to one). Then sto¬ 
chastic declustering can be used in the following way. Given 
an initial guess Po of the matrix P, we then have N(N + 
l)/2 probabilistic data points {(t k ,x k ,y k ,p kk )}^ l and {(t; — 
tj, Xi — Xj,yt — yj,Pji)}i>j- Given this data, a nonparametric 
density estimation procedure can be used to estimate pi from 
{(t k , x k , y k ,p kk )}%=i and g from {%■ - tj, x t - Xj, y t - yj, Pji))i>j, 
providing estimates /iq and go- We can then proceed iteratively 
as follows until convergence is achieved: 


Step 1. Estimate jx n and g n from P n -\. 

Step 2. Update P n from pi n and g n using (8) and (9). 


For example, a simple histogram estimator is used in Marsan 
and Lenglin (2008) in step 1. 


3. A SELF-EXCITING POINT PROCESS 
MODEL OF BURGLARY 


For the purpose of modeling burglary we consider an un¬ 
marked self-exciting model for the conditional intensity of the 
form 

X(t,x,y) = v(t) k i(x,y) + X 8(t-tk,x-x k ,y-y k ). (10) 

{k:t k <t} 

Here we neglect spatially localized temporal fluctuations in the 
background rate and assume that the fluctuations occur globally 
(e.g., due to weather, seasonality, time of day, etc.) In the case 
of seismology, research over a number of decades was needed 
to refine the (parametric) form of the triggering function g. For 
this reason, nonparametric methods are appealing in the con¬ 
text of crime in order to quickly gain insight into the forms of 
v, pi, and g. For this purpose we use the iterative procedure out¬ 
lined in the previous section to estimate the model, with several 
modifications. 


Because the probabilistic data {( t k , x k , y k ,pkk)} k= i and {(t; — 
tj, xt —Xj, yi —yj,Pji)}i>j is both three-dimensional and the num¬ 
ber of data points is 0(N 2 ) [where N is typically (9(1000) for 
earthquake and crime datasets], the estimation step for pi and 
g is computationally expensive. The dimensionality prevents 
straightforward use of binning methods such as the Average 
Shifted Histogram (Marsan and Lenglin use a logarithmically 
scaled histogram on a coarse grid), as many bins may have ex¬ 
tremely small, but nonzero, values (since the data is probabilis¬ 
tic, the count in each bin can be less than 1). Alternatively, the 
large size of the data set prevents efficient use of off-grid meth¬ 
ods such as Kernel Density Estimation. To get around these is¬ 
sues we use the following Monte Carlo-based iterative proce¬ 
dure: 

Step 1. Sample background events {(tf, xf,yf)}^j and off- 

spring/parent interpoint distances {(f t , x°, y°)}fZi f rom Pn- 1- 

Step 2. Estimate v n , pi n and g n from the sampled data using 
variable bandwidth Kernel Density Estimation. 

Step 3. Update P n from v n , pi n and g n using (8) and (9). 

Because N k + N 0 = N, the size of the sampled data at each 
iteration allows for the use of Kernel Density Estimation. An¬ 
other issue is that the number of background and offspring 
events, N k and N a , is changing at each iteration. Thus a fixed 
bandwidth for any density estimation technique (kernel smooth¬ 
ing, histogram, etc.) wifi over-smooth at some iterations and 
under-smooth at others. Therefore we employ variable band¬ 
width KDE (alternatively Cross Validation could be used). We 
give further details of our approach and provide validation us¬ 
ing a simulated point process in the Appendix. 

4. RESULTS 

We fit the model given by Equation (10) to a dataset collected 
by the Los Angeles Police Department of 5376 reported resi¬ 
dential burglaries in an 18 km by 18 km region of the San Fer¬ 
nando Valley in Los Angeles occurring during the years 2004 
and 2005. Each burglary is associated with a reported time win¬ 
dow over which it could have occurred, often a few hour span 
(for instance, the time span over which a victim was at work), 
and we define the time of burglary to be the midpoint of each 
burglary window. 

In Figure 3, we plot the sampled interpoint distances {(tf, x°, 
y°))iZ\ on the 75th iteration of the stochastic declustering al¬ 
gorithm (see Appendix for convergence verification). The num¬ 
ber of sampled (offspring) events is 706 (13.1% of all events) 
and of these events approximately 63% are exact repeats (oc¬ 
curring at the same house). On the left, the spatial interpoint 
distances are plotted showing that elevated crime risk travels 
around 50 m-100 m from the house of an initial burglary to 
the location of direct offspring events. As discussed in Marsan 
and Lenglin (2008), the overall distance near-repeat risk travels 
is several times further due to the cascading property of self¬ 
exciting point processes. Note also that the risk travels verti¬ 
cally and horizontally (along streets), more so than it does in 
other directions. On the right, we plot the spatial (x-coordinate) 
interpoint distances against the time interpoint distances. Here 
exact-repeat burglaries, those occurring at the same house, are 
apparent along the x-axis. 
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Figure 3. Spatial (left) and space-time (right) offspring/parent interpoint distances {(tf ,x°, y °)}^ 1 sampled from P 75 . 


In Figure 4, we plot (on a logarithmic scale) the estimated 
marginals 


-// 


815(0 = gi5(t, y) dx dy 


and 


-IS 


g7500 = gi 5 (t, x, y)dydt 


computed from the KDE estimate of g at the 75th iteration. Here 
the presence of exact-repeat events can again be seen, as £75 (x) 
appears to approximate a delta distribution at the origin. The 
spike around 1-2 days in the plot of gis(0 is due to the pres¬ 
ence of fast “crime sprees,” where most likely the same burglar 
visited several neighboring houses within a time span of a few 
minutes to several days. There are also several bumps in the el¬ 
evated risk of burglary, for example, around 7 days. Here one 


possible explanation is that the routine of the burglar and/or 
the victim is such that a particular day of the week is a prefer¬ 
able time to commit the burglary. After 7-10 days, the elevated 
risk of repeat/near-repeat victimization drops to an intermedi¬ 
ate level and stays relatively flat for a time span on the order 
of several hundred days before decaying back to baseline rates. 
These results are consistent with previous quantitative studies 
of exact-repeat burglaries (Short et al. 2009). 

In Johnson (2008), the authors discuss the need to model 
risk heterogeneity and in general it is a difficult task to sepa¬ 
rate clustering due to background heterogeneity and clustering 
due to self-excitation. One benefit of using the nonparametric 
approach outlined above is that temporal and spatial changes 
in the rate of crime are automatically separated into those stem¬ 
ming from exogenous effects and those due to self-excitation. In 
Figure 5, we plot the estimated marginals V 75 (0 and sijs(x,y) 
estimated using KDE from {(t b , x b , y b )}^ 1 at the 75th iteration. 




x-coordinate (km) 


Figure 4. Marginal gjs(t) (left) and marginal £75 (v) (right) estimated using KDE based upon offspring/parent interpoint distances sampled 
from P 75 . 
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Figure 5. Background rate time marginal V 75 (t) (left) and space marginal /z 75 (x, y) (right) estimated using KDE from the background events 
sampled from P 75 . 


Here the estimated background rate exhibits temporal fluctua¬ 
tions on a time scale of months/days, separate from the fluctua¬ 
tions due to self-excitation. These fluctuations are likely caused 
by a number of factors such as seasonal, economic, and demo¬ 
graphic changes, as well as temporal variations in burglar rou¬ 
tine activities (Felson 1998). For example, residential burglary 
tends to have a higher weekday rate (when victims are at work) 
compared to weekends. 

Similarly, the background rate is also spatially variable, 
which is consistent with fixed environmental heterogeneity in 
crime opportunities, as well as variability in population density 
through space (Bernasco and Nieuwbeerta 2005). In seismol¬ 
ogy, declustered catalogs are of great interest as they can be 
used in estimating the background rate of major earthquakes. 
Declustered crime catalogs could potentially be used by police 
to distinguish between areas of a city with intrinsically high 
crime rates and areas with temporarily high crime rates (due to 
near-repeat effects). As the former arises due to structural prop¬ 
erties of a given neighborhood and the latter from behavioral 
characteristics of individual burglars, police and community re¬ 
sponses would likely need to be different in each case. 

5. CRIME FORECASTING: POINT PROCESSES 
VERSUS HOTSPOT MAPS 

Crime hotspot maps are a well-established tool for visualiza¬ 
tion of space-time crime patterns and can be used as a method 
for prediction of near-repeat crimes. Given space-time crime 
observations ( 4 , Xk, yk), crime hotspot maps are generated for a 
time interval [t — T, t ] by overlaying a density plot of the func¬ 
tion, 

X(t,x,y) = ^2 g(t-tk,x-xk,y-yk), (11) 

t—T <tk<t 

onto a city map, where g(t, x, y) is a space-time kernel. By flag¬ 
ging the areas of the city where X takes on its highest values, 
crime hotspot maps can be used to indicate which areas in the 


city are likely to contain future crimes (Bowers, Johnson, and 
Pease 2004; Chainey, Tompson, and Uhlig 2008). 

For example, in Bowers, Johnson, and Pease (2004) a coarse 
grained kernel is used that decays inversely proportional to spa¬ 
tial and temporal distance. In particular, with spatial distance 
d in units of 1 /2 cell widths and time t in units of weeks, the 
kernel in (11) is specified as 


git, d ) = 


1 

(l+OO+dj 


( 12 ) 


on the domain (t, d) e [0, 2 months] x [0,400 meters] and 0 
otherwise. Such a crime hotspot map is referred to as “prospec¬ 
tive,” as it uses past crimes, coupled with the contagious spread 
of crime (modeled by g), to estimate future relative crime risk 
across the city. It should be noted that the risk is relative because 
(11) is not a point process intensity. 

Here we compare the predictive accuracy of the self-exciting 
point process model developed in Section 3 to the prospective 
crime hotspot map given by (11)-(12). Because crime is local¬ 
ized in small regions of the city (a commercial zone with no res¬ 
idential burglary may be located 100 meters from a neighbor¬ 
hood), we find that for predictive purposes variable bandwidth 
KDE is less accurate than fixed bandwidth KDE. We there¬ 
fore estimate /x(x,y) in Equation (10) using fixed bandwidth 
Gaussian KDE, with 20-fold cross validation used to select the 
bandwidth (Silverman 1986). 

For every day k of 2005, each model assesses the risk of bur¬ 
glary within each of M 2 cells partitioning an 18 km by 18 km 
region of the San Fernando Valley in Los Angeles. Based on the 
data from the beginning of 2004 up through day k, the N cells 
with the highest risk (value of A.) are flagged yielding a predic¬ 
tion for day k + 1. The percentage of crimes falling within the 
flagged cells on day k + 1 is then recorded and used to measure 
the accuracy of each model. 

In Figure 6, on the left we plot the percentage of crimes 
predicted averaged over the forecasting year against the per¬ 
centage of flagged cells for the self-exciting point process and 
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Figure 6. Forecasting strategy comparison. Average daily percentage of crimes predicted plotted against percentage of cells flagged for 2005 
burglary using 200 m by 200 m cells. Error bars correspond to the standard error. Prospective hotspot cutoff parameters are 400 meters and 8 
weeks (left) and optimal parameters (right) are 200 meters and 39 weeks. Spatial background intensity n(x,y) smoothing bandwidth for the 
point process is 300 meters (left) selected by cross validation and 130 meters (right) selected to optimize the number of crimes predicted. 


the prospective hotspot strategy. For example, with 10% of 
the city flagged the point process and prospective hotspot cor¬ 
rectly predict 660 and 547 crimes (respectively) out of 2627. 
The difference in accuracy between the two methodologies can 
be attributed to the crime hotspot map’s failure to account for 
the background rate of crime. While prospective crime hotspot 
maps used for crime prediction attempt to quantify the conta¬ 
gious spread of crime following past events, they fail to assess 
the likelihood of future “background” events, the initial events 
that trigger crime clusters. 

In order to disentangle the dependence of model accuracy on 
parameter selection, in Figure 6 on the right we repeat the same 
prediction exercise but with parameters of each model selected 
to yield the highest number of crimes predicted (L 1 norm over 
1 through 15% of cells flagged). The optimal cutoff parameters 
for the prospective hotspot map are 200 meters and 39 weeks. 
With these parameter values, in particular the slow decay of g 
in time, Equation (11) is closer to Poisson estimation. For the 
point process model we only optimize the bandwidth used for 
li(x,y) as the computational cost of the stochastic decluster¬ 
ing algorithm is relatively high. Whereas the bandwidth is esti¬ 
mated to be approximately 300 meters using cross validation, a 
smaller bandwidth, 130 meters, provides a higher level of pre¬ 
dictive accuracy. This can be attributed to the spatially localized 
features of neighborhoods, and hence burglary. 

For all percentages of cells flagged the prospective hotspot 
map underperforms the point process, though for certain per¬ 
centages the relative underperformance is less. On the left in 
Figure 6, the prospective hotspot map performs better (relative 
to the point process) for smaller percentages of cells flagged, as 
the parameters are selected to account for near-repeat effects. 
On the right, the prospective hotspot map performs better for 
larger percentages of flagged cells, since for these parameter 
values the model is more accurately estimating fixed environ¬ 
mental heterogeneity. For crime types such as robbery and auto 


theft, where near-repeat effects play less of a role, prospective 
hotspot maps tailored for near-repeat effects are likely to be 
outperformed by simple Poisson estimation. The advantage of 
models of the form (10) is that the balance between exogenous 
and endogenous contributions to crime rates is inferred from 
the data as opposed to being imposed a priori. 

6. DISCUSSION 

We showed how self-exciting point processes from seismol¬ 
ogy can be used for the purpose of crime modeling. In the fu¬ 
ture it may be desirable to tailor point process models specifi¬ 
cally for crime, taking into account the crime type and the lo¬ 
cal geography of the city. Based upon the insights provided by 
nonparametric estimates, parametric models can be constructed 
that have advantages with respect to model fitting and simula¬ 
tion. Background rates can also be improved by incorporating 
other data types (in Johnson 2008, housing density is used to 
improve models of repeat victimization). In the case of gang 
violence, a hybrid network-point process approach may be use¬ 
ful for capturing the self-exciting effects stemming from gang 
retaliations. Here increased risk may not diffuse in geographic 
space, but instead may travel through the network space of gang 
rivalry relations. 

The methodology used in this study can be implemented for 
other applications as well, for example refining point process 
models of earthquakes. It could potentially be adapted, more 
generally, to other second-order models of point processes. 
The stochastic declustering algorithm opens up the door to 
a plethora of density estimation techniques (Silverman 1986; 
Scott 1992; Eggermont and LaRiccia 2001) that could be used 
to explore point processes in a way parametric methods do not 
allow. 

In Marsan and Lenglin (2010) it is shown that the method 
is an Expectation-Maximization (EM) type algorithm. At the 
maximization step the complete data log-likelihood function 
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decouples in terms of the background and triggering functions, 
which is why at each iteration the problem reduces to several 
decoupled density estimation problems. Several issues could 
potentially arise here, one being that the method could con¬ 
verge to a local (but not global) minimum of the observed data 
log-likelihood function. Another, as pointed out in Sornette and 
Utkin (2009), is that the sample size and domain size (relative 
to the support of the triggering kernel) play a key role in the 
accuracy of stochastic declustering. In numerical tests we have 
found that at least 0(1000) data points are needed in three di¬ 
mensions for the iterates to converge to the right densities and 
the domain needs to be several times larger than the support 
of the triggering kernel. Similar to analytical results for stan¬ 
dard density estimation, it would be useful to have convergence 
results relating sample size, branching ratio, domain size, and 
the bandwidth of the density estimators to the solution of the 
fixed-point iteration. 


APPENDIX 

Given point data (/£,%> .yjOfcLi and a self-exciting point process 
model of the form, 

X(t,x,y) = v(t)p(x,y) + ^2 S(t~tk^-x k ,y-y k ), (A.l) 

{k: t k <t) 

we iterate the following until convergence: 

Step 1. Sample background events {(t^ ,yf)}^ 1 and offspring/ 

parent interpoint distances {(tf, x°, y?)}^ from P n —\. 

Step 2. Estimate v n , /i n , and g n from the sampled data. 

Step 3. Update P n from v n , pL n , and g n using (8) and (9). 

In order to estimate v n , p, n , and g n from the sampled data, we use 
variable bandwidth Kernel Density Estimation. To estimate g n , we first 
scale the data {(/?, x?, y?)}^ to have unit variance in each coordinate 
and based upon the rescaled data compute D/, the kth nearest neighbor 
distance (three-dimensional Euclidean distance) to data point i. We 
then transform the data back to its original scale and, letting ay, <r y , 






Figure A.l. L2 error ||P„ — P n - \ ||2 (top left) and Njj, the number of sampled background events, (top right) at the nth iteration for known 
point process model. L2 error ||P n — P n —\ ||2 (bottom left) and N^, the number of sampled background events, (bottom right) at the nth iteration 
for the method applied to the 5376 burglary events in Section 3. 
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and at be the sample standard deviation of each coordinate, estimate 
the triggering function as 



1 

a x o y o t {27x)^ > /' 1 ^D i i 


( (*-*?) 2 _ (y-yf) 2 _ 

X6XP l 2a} Dj 2a} tif 2a?bf )' 

The background rate is estimated similarly, where one-dimensional 
and two-dimenional Gaussian kernels are used to estimate v n and /z n , 
respectively. In Zhuang, Ogata, and Vere-Jones (2002), the authors rec¬ 
ommending using the 10th-100th nearest neighbor distance for D*. 
Throughout we compute D[ corresponding to v using the 100th near¬ 
est neighbor distance and in higher dimensions we use the 15th nearest 
neighbor distance for D; corresponding to /z and g. 

We validate the method by simulating (A.l) with 

= ^W exp (-^) exp (-^) 

and 


g(t, x, y) = 6 oj exp (—cot) exp 



and comparing the estimates supplied by the method with the known 
distribution. The simulation was carried out by first simulating all 
background events according to the Poisson process v/z. The rest of the 
simulation was carried out iteratively, where each point of each gen¬ 
eration generates its own offspring according to the Poisson process 
g centered at the parent point. The process terminates at the nth gen¬ 
eration when all events of the nth generation lie outside of the time 
window under consideration. In order to have a realization of the point 
process at steady state, the first and last 2000 points were disregarded 
in each simulation. 

In Figure A.l, we plot the L2 error \\P n — P n ~\ \\i at the nth it¬ 
eration and the number of sampled background events N^ at the nth 
iteration against the true number of background events for one real¬ 
ization of the known point process. Here we observe that the error 
converges quickly for the first 10 iterations and then stabilizes as the 
error introduced by estimating the point process through sampling P 
cannot be reduced further (unless a deterministic iterative procedure 
is employed). We also verify that the method applied to the 5376 bur¬ 
glary events in Section 3 reached convergence in Figure A.l. Here we 
observe a similar rate of convergence for the crime data as with the 
simulated point process. 

In Table A.l, we list the exact parameter values used for the simu¬ 
lated point process and the estimates averaged over the final 10 itera¬ 
tions of the stochastic declustering algorithm for each of five simula¬ 
tions of the point process. The parameter values were selected to yield 
point patterns with scales similar to those observed in crime data. The 
parameter estimates are computed using the sample variances of the 
coordinates of {(f?, x°,y°)}^^ and the values of N^, N a . As some er¬ 
ror is due to sample variation, we plot in the last two columns the 


Table A.l. Parameter value estimates 



w 1 



0 

JZ 

Nfj est. 

Nfj true 

True values 

10.00 

0.0100 

0.1000 

0.2000 

5.7100 



Run 1 est. 

11.08 

0.0176 

0.1433 

0.2001 

5.6921 

3999.7 

4041 

Run 2 est. 

12.20 

0.0156 

0.1296 

0.1967 

5.7768 

4016.5 

4026 

Run 3 est. 

11.76 

0.0150 

0.1295 

0.1997 

5.6711 

4001.5 

4017 

Run 4 est. 

13.30 

0.0135 

0.1407 

0.2049 

5.6185 

3975.3 

4015 

Run 5 est. 

11.27 

0.0147 

0.1317 

0.2102 

5.7652 

3948.9 

3977 


estimated number of background events versus the actual number of 
background events in each of the five simulations to assess the abil¬ 
ity of the method to reconstruct the realized branching structure. In 
Figure A.2, we plot the estimated marginals of g(t,x,y ) against the 




Figure A.2. Estimated (circles) and actual (solid line) marginals of 
g(t,x,y ) on the 75th iteration. Top is the marginal g(x), in the middle 
is the marginal g(y), and lower figure is marginal g(t). 
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actual distributions on the 75 th iteration of the stochastic declustering 
algorithm. The estimated time marginal density deviates from the true 
density at the origin due to the jump discontinuity of the exponential 
distribution. However, the estimate of the parameter co is still close to 
the true value (see Table A.l). 

[Received September 2009. Revised October 2010.] 
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